[1] 0.6108643
How information theory can help solve real-world financial problems
So Far We Have Studied…
• The important numerical properties of the empirical correlation (and by extension, covariance) matrix.
Critical Limitations of Correlation
• Despite its virtues, correlation suffers from several critical limitations as a measure of codependency.
Overcoming These Limitations
• In this lecture, we will overcome these limitations by reviewing information theory concepts that underlie many modern marvels.
Information Theory Concepts in modern life
• Internet, mobile phones, file compression, video streaming, and encryption.
Why We Looked Beyond Correlation
• None of these inventions would have been possible if researchers had not looked beyond correlations to understand codependency.
🔍 Information Theory Applications in Finance
• As it turns out, information theory in general, and the concept of Shannon’s entropy in particular, also have useful applications in finance. 💡
• The key idea behind entropy is to quantify the amount of uncertainty associated with a random variable. 🔍
• Information theory is also essential to ML, because the primary goal of many ML algorithms is to reduce the amount of uncertainty involved in the solution to a problem. 💯
👉 We will see how Shannon’s entropy, a key concept in information theory, can help solve real-world financial problems! 📈
Consider Two Random Vectors…
• Let X and Y be two random vectors of size T, and a correlation estimate ρ(X,Y), with the only requirement that σ(X,Y) = ρ(X,Y)σ(X)σ(Y).
• σ(X,Y) is the covariance between the two vectors.
• σ(X) and σ(Y) are the standard deviations of X and Y, respectively.
Pearson’s Correlation…
• Pearson’s correlation is one of several correlation estimates that satisfy these requirements.
The correlation-based distance metric is defined as: \[d(x, y) = \sqrt{2(1 - \rho(x, y))}\] where \(\rho(x, y)\) is the correlation between \(x\) and \(y\).
This metric does satisfy the properties of a distance metric.
📝 Normalisation
🔍 Non-negativity Property
💡 Financial Applications
This property is very useful in finance, for example, we may wish to build a long-only portfolio where holdings in negative-correlated securities can only offset risk and therefore should be treated as different for diversification purposes.
📝 Aim
\[d_{|p|} = \sqrt{1 - | \rho(X,Y) |}\]
💡 Metric Definition
🔍 Advantages
# Calculate mutual information (and hence joint entropy)
joint_entropy = mutinformation(discrete_returns1, discrete_returns2, method = "emp")
print(paste("Joint entropy (natural log base):", joint_entropy))[1] "Joint entropy (natural log base): 0.194283857148803"
log(2):# Convert the result to bits
joint_entropy_bits = joint_entropy / log(2)
print(paste("Joint entropy (in bits):", joint_entropy_bits))[1] "Joint entropy (in bits): 0.280292357233359"
quantmod package, then download stock data for AAPL and MSFT.Next, we calculate the daily returns for each stock.
To apply the infotheo functions, we need to discretize the returns. Then, we calculate the conditional entropy and mutual information.
library(infotheo)
# Discretise returns
discretized_aapl <- discretize(as.numeric(returns_aapl))
discretized_msft <- discretize(as.numeric(returns_msft))
# Mutual Information between AAPL and MSFT returns
mutual_info <- mutinformation(discretized_aapl, discretized_msft, method = "emp")
print(paste("Mutual Information between AAPL and MSFT:", mutual_info))[1] "Mutual Information between AAPL and MSFT: 0.516904960511519"
-It combines the concepts of mutual information and entropy to measure how much two variables or clusterings differ in terms of their information content.
Data Intensive: Requires comprehensive data for accurate calculation and interpretation.
Complexity: Understanding and applying VI can be complex, requiring a solid foundation in information theory.
Interpretation: Interpreting the results of VI analysis in the context of financial decision-making can be challenging.
| Metric | Definition | Relationship | Applications in Finance | Metric Properties |
|---|---|---|---|---|
| Entropy (H) | Measures the uncertainty or unpredictability of a variable’s outcome. | Fundamental to all other metrics as a measure of uncertainty. | Quantifying the unpredictability of financial variables such as asset returns, market volatility. | No, entropy is not a distance measure and does not satisfy metric properties. |
| Mutual Information (I) | Measures the amount of information shared between two variables; how much knowing one reduces uncertainty about the other. | Relates to entropy by quantifying the reduction in uncertainty. | Analyzing dependencies between financial variables, identifying market trends. | No, mutual information is not a distance measure; it lacks symmetry and the triangle inequality for distances. |
| Conditional Entropy (H(X|Y)) | Quantifies the remaining uncertainty in one variable when the state of another is known. | Derived from entropy, showing the reduction in uncertainty given another variable. | Assessing predictability of financial variables given others. | No, conditional entropy is directional and does not satisfy symmetry or the triangle inequality. |
| Cross-Entropy (Hc) | Measures the expected number of bits needed to identify an event from a set, based on a different probability distribution. | Incorporates KL divergence when comparing two distributions. | Evaluating performance of predictive models in finance. | No, cross-entropy is not symmetric and does not satisfy the triangle inequality. |
| Kullback-Leibler Divergence (D_{KL}) | Quantifies the difference between two probability distributions. | Measures the information gain from one distribution to another. | Measuring the discrepancy between model predictions and actual data. | No, KL divergence is not symmetric and does not satisfy the triangle inequality. |
| Variation of Information (VI) | A metric to quantify the dissimilarity between two random variables or clusterings. | Combines entropy and mutual information to measure total unique information. | Comparing financial market segmentations, understanding diversification. | Yes, variation of information is a true metric as it satisfies non-negativity, identity of indiscernibles, symmetry, and the triangle inequality. |
First, let’s simulate the data:
The Pearson correlation can be directly calculated using the cor() function:
Calculating the Variation of Information (VI) in R for continuous variables directly is not as straightforward because it typically requires the variables to be discretized or involves estimating their joint and individual entropies.
Since there’s no built-in function for VI in base R or common packages, we’ll need to implement it. This involves discretizing the data, calculating the entropies, and then using the VI formula.
library(entropy) # Load the infotheo package for entropy and mutual information calculations
x_disc <- discretize(x,numBins = 20) # Discretize x into 10 bins
y_disc <- discretize(y, numBins=20) # Discretize y into 10 bins
# Convert discretized data to numeric indices
x_bins <- as.numeric(factor(x_disc))
y_bins <- as.numeric(factor(y_disc))
# Create contingency table for joint distribution
joint_distribution <- table(x_bins, y_bins)
# Calculate individual entropies
H_x <- entropy(table(x_bins))
H_y <- entropy(table(y_bins))
# Calculate joint entropy
H_xy <- entropy(joint_distribution)
# Calculate mutual information
I_xy <- H_x + H_y - H_xy
# Calculate Variation of Information
VI <- H_x + H_y - 2 * I_xy
print(paste("Variation of Information:", VI))[1] "Variation of Information: 2.23006570879938"
# For the Pearson correlation, no change is needed
linear_correlation <- cor(x, y)
print(paste("Linear Correlation:", linear_correlation))[1] "Linear Correlation: -0.00696311389621409"
One common approach is to normalize VI by the joint entropy H(X,Y)H(X,Y) or by the sum of the individual entropies H(X)+H(Y)H(X)+H(Y), depending on the context and what aspect of the data you’re most interested in:
Normalized VI=VI(X,Y)H(X)+H(Y)Normalized VI=H(X)+H(Y)VI(X,Y)
This normalization ensures that the VI lies between 0 and 1, where 0 indicates that the two variables share all their information (identical distributions) and 1 indicates that the two variables share no information.
H_x, H_y, and I_xy (mutual information) have already been calculated:# Assuming H_x, H_y, and I_xy have already been calculated
# Calculate Variation of Information (VI) again for clarity
VI <- H_x + H_y - 2 * I_xy
# Normalize VI to range between 0 and 1
# Normalized VI is VI divided by the sum of the individual entropies (max possible VI)
Normalized_VI <- VI / (H_x + H_y)
print(paste("Variation of Information:", VI))[1] "Variation of Information: 2.23006570879938"
[1] "Normalized Variation of Information: 0.896160537565619"
::: ::: column #### Interpretation - Variation of Information (VI): Provides a measure of the total unique and shared information between two variables. A lower value indicates more shared information, while a higher value indicates less shared information and more uniqueness.
::: :::
This normalisation technique does not change the essence of what VI measures but scales its output to a more interpretable range, especially useful when comparing across different pairs of variables or datasets.
Remember, this normalised VI still does not indicate directionality or the type of relationship (linear or non-linear) between the variables, unlike correlation coefficients.
This example simulates a non-linear relationship between (x) and (y), calculates the linear correlation to show its limitation in capturing non-linear dependencies, and calculates the Variation of Information, which is not limited to linear relationships and can capture the total amount of shared and unique information between the variables.
The Pearson correlation coefficient might be low or not significant because it only measures linear relationships. In contrast, the Variation of Information could be higher, reflecting the complexity of the non-linear relationship that the linear correlation fails to capture.
RTransferEntropy to investigate (Simon et al. 2019)Quantifying information flow between markets and individual stocks is a central tenant of quantitative research
CAPM prices stocks in this way, imposing a linear unidirectional relationship between individual stocks and the market return
Is this assumption valid?
We will test this assumption using a selection of 10 stocks from the S&P 500
library(RTransferEntropy)
library(future)
library(tidyverse)
# enable parallel processing
plan(multisession) # initialised a multicore enviroment
data("stocks") # loads data
stocks %>% glimpse()
TE<-stocks %>%
group_by(ticker) %>%
group_split(.keep = TRUE) %>%
map(~transfer_entropy(x = .x$ret,y=.x$sp500,shuffles = 500,type = "bins",bins=12))
names(TE)<-unique(stocks$ticker)
save(TE,file="Estimation_cache/Transfer_Entropy_SP.RData")Correlations are useful at quantifying the linear codependency between random variables.
However,when variables X and Y are bound by a nonlinear relationship, the above distance metric misjudges the similarity of these variables.
For non-linear cases, López de Prado (2020) argues the normalized mutual information is a more appropriate distance metric.
Given that many ML algorithms do not impose a functional form on the data, it makes sense to use them in conjunction with entropy-based features.
AI & Trading